本文在3D Point Cloud中介绍了一个新问题:很少示例实例分割。给定一些带注释的点云举例说明了目标类,我们的目标是在查询点云中细分该目标类的所有实例。这个问题具有广泛的实用应用,在重点实例分段注释非常昂贵的收集中。为了解决此问题,我们提出了测量形式 - 第一个用于3D点云实例分割的地球引导变压器。关键的想法是利用大地距离来应对LIDAR 3D点云的密度不平衡。 LIDAR 3D点云在物体表面附近茂密,在其他地方稀疏或空,使欧几里得距离较差以区分不同的物体。另一方面,大地测量距离更合适,因为它编码了场景的几何形状,该几何形状可以用作变压器解码器中注意机制的指导信号,以生成代表实例的不同特征的内核。然后将这些内核用于动态卷积以获得最终实例掩模。为了评估新任务上的测量形式,我们提出了两个常见的3D点云实例分割数据集的新拆分:ScannETV2和S3DIS。地球形式始终优于根据最新的3D点云实例分割方法的强大基线,并具有明显的余量。代码可从https://github.com/vinairesearch/geoformer获得。
translated by 谷歌翻译
Despite their widespread adoption, neural conversation models have yet to exhibit natural chat capabilities with humans. In this research, we examine user utterances as causes and generated responses as effects, recognizing that changes in a cause should produce a different effect. To further explore this concept, we have compiled and expanded upon a new dataset called CausalDialogue through crowd-sourcing. This dataset includes multiple cause-effect pairs within a directed acyclic graph (DAG) structure. Our analysis reveals that traditional loss functions can struggle to effectively incorporate the DAG structure, leading us to propose a causality-enhanced method called Exponential Maximum Average Treatment Effect (ExMATE) to enhance the impact of causality at the utterance level in training neural conversation models. To evaluate the effectiveness of this approach, we have built a comprehensive benchmark using the CausalDialogue dataset leveraging large-scale pre-trained language models, and have assessed the results through both human and automatic evaluation metrics for coherence, diversity, and agility. Our findings show that current techniques are still unable to effectively address conversational DAGs, and that the ExMATE method can improve the diversity and agility of conventional loss functions while maintaining coherence.
translated by 谷歌翻译
Adversarial machine learning has been both a major concern and a hot topic recently, especially with the ubiquitous use of deep neural networks in the current landscape. Adversarial attacks and defenses are usually likened to a cat-and-mouse game in which defenders and attackers evolve over the time. On one hand, the goal is to develop strong and robust deep networks that are resistant to malicious actors. On the other hand, in order to achieve that, we need to devise even stronger adversarial attacks to challenge these defense models. Most of existing attacks employs a single $\ell_p$ distance (commonly, $p\in\{1,2,\infty\}$) to define the concept of closeness and performs steepest gradient ascent w.r.t. this $p$-norm to update all pixels in an adversarial example in the same way. These $\ell_p$ attacks each has its own pros and cons; and there is no single attack that can successfully break through defense models that are robust against multiple $\ell_p$ norms simultaneously. Motivated by these observations, we come up with a natural approach: combining various $\ell_p$ gradient projections on a pixel level to achieve a joint adversarial perturbation. Specifically, we learn how to perturb each pixel to maximize the attack performance, while maintaining the overall visual imperceptibility of adversarial examples. Finally, through various experiments with standardized benchmarks, we show that our method outperforms most current strong attacks across state-of-the-art defense mechanisms, while retaining its ability to remain clean visually.
translated by 谷歌翻译
Pareto Front Learning (PFL) was recently introduced as an effective approach to obtain a mapping function from a given trade-off vector to a solution on the Pareto front, which solves the multi-objective optimization (MOO) problem. Due to the inherent trade-off between conflicting objectives, PFL offers a flexible approach in many scenarios in which the decision makers can not specify the preference of one Pareto solution over another, and must switch between them depending on the situation. However, existing PFL methods ignore the relationship between the solutions during the optimization process, which hinders the quality of the obtained front. To overcome this issue, we propose a novel PFL framework namely \ourmodel, which employs a hypernetwork to generate multiple solutions from a set of diverse trade-off preferences and enhance the quality of the Pareto front by maximizing the Hypervolume indicator defined by these solutions. The experimental results on several MOO machine learning tasks show that the proposed framework significantly outperforms the baselines in producing the trade-off Pareto front.
translated by 谷歌翻译
The introduction of high-quality image generation models, particularly the StyleGAN family, provides a powerful tool to synthesize and manipulate images. However, existing models are built upon high-quality (HQ) data as desired outputs, making them unfit for in-the-wild low-quality (LQ) images, which are common inputs for manipulation. In this work, we bridge this gap by proposing a novel GAN structure that allows for generating images with controllable quality. The network can synthesize various image degradation and restore the sharp image via a quality control code. Our proposed QC-StyleGAN can directly edit LQ images without altering their quality by applying GAN inversion and manipulation techniques. It also provides for free an image restoration solution that can handle various degradations, including noise, blur, compression artifacts, and their mixtures. Finally, we demonstrate numerous other applications such as image degradation synthesis, transfer, and interpolation.
translated by 谷歌翻译
In this work, we identify elements of effective machine learning datasets in astronomy and present suggestions for their design and creation. Machine learning has become an increasingly important tool for analyzing and understanding the large-scale flood of data in astronomy. To take advantage of these tools, datasets are required for training and testing. However, building machine learning datasets for astronomy can be challenging. Astronomical data is collected from instruments built to explore science questions in a traditional fashion rather than to conduct machine learning. Thus, it is often the case that raw data, or even downstream processed data is not in a form amenable to machine learning. We explore the construction of machine learning datasets and we ask: what elements define effective machine learning datasets? We define effective machine learning datasets in astronomy to be formed with well-defined data points, structure, and metadata. We discuss why these elements are important for astronomical applications and ways to put them in practice. We posit that these qualities not only make the data suitable for machine learning, they also help to foster usable, reusable, and replicable science practices.
translated by 谷歌翻译
Online Temporal Action Localization (On-TAL) aims to immediately provide action instances from untrimmed streaming videos. The model is not allowed to utilize future frames and any processing techniques to modify past predictions, making On-TAL much more challenging. In this paper, we propose a simple yet effective framework, termed SimOn, that learns to predict action instances using the popular Transformer architecture in an end-to-end manner. Specifically, the model takes the current frame feature as a query and a set of past context information as keys and values of the Transformer. Different from the prior work that uses a set of outputs of the model as past contexts, we leverage the past visual context and the learnable context embedding for the current query. Experimental results on the THUMOS14 and ActivityNet1.3 datasets show that our model remarkably outperforms the previous methods, achieving a new state-of-the-art On-TAL performance. In addition, the evaluation for Online Detection of Action Start (ODAS) demonstrates the effectiveness and robustness of our method in the online setting. The code is available at https://github.com/TuanTNG/SimOn
translated by 谷歌翻译
Recent studies on adversarial images have shown that they tend to leave the underlying low-dimensional data manifold, making them significantly more challenging for current models to make correct predictions. This so-called off-manifold conjecture has inspired a novel line of defenses against adversarial attacks on images. In this study, we find a similar phenomenon occurs in the contextualized embedding space induced by pretrained language models, in which adversarial texts tend to have their embeddings diverge from the manifold of natural ones. Based on this finding, we propose Textual Manifold-based Defense (TMD), a defense mechanism that projects text embeddings onto an approximated embedding manifold before classification. It reduces the complexity of potential adversarial examples, which ultimately enhances the robustness of the protected model. Through extensive experiments, our method consistently and significantly outperforms previous defenses under various attack settings without trading off clean accuracy. To the best of our knowledge, this is the first NLP defense that leverages the manifold structure against adversarial attacks. Our code is available at \url{https://github.com/dangne/tmd}.
translated by 谷歌翻译
我们解决了在均质半透明材料中建模光散射并估算其散射参数的问题。散射相函数是影响散射辐射分布的此类参数之一。它是在实践中建模的最复杂,最具挑战性的参数,通常使用经验相位函数。经验相函数(例如Henyey-Greenstein(HG)相位函数)通常会呈现,并限于特定的散射材料范围。这种限制引起了人们对目标材料通常未知的反向渲染问题的关注。在这种情况下,首选更通用的相位函数。尽管使用诸如Legendre多项式\ cite {Fowler1983}之类的基础中存在这种通用相位函数,但此相函数的逆渲染并不直接。这是因为基础多项式在某个地方可能是负面的,而相位函数不能。这项研究提出了一种新型的通用相位功能,可以避免此问题,并使用此阶段函数进行逆渲染应用。通过以MIE散射理论建模的广泛的材料对所提出的相函数进行了积极评估。通过模拟和现实世界实验评估了带有建议的相函数的散射参数估计。
translated by 谷歌翻译
成功的人工智能系统通常需要大量标记的数据来从文档图像中提取信息。在本文中,我们研究了改善人工智能系统在理解文档图像中的性能的问题,尤其是在培训数据受到限制的情况下。我们通过使用加强学习提出一种新颖的填充方法来解决问题。我们的方法将信息提取模型视为策略网络,并使用策略梯度培训来更新模型,以最大程度地提高补充传统跨凝结损失的综合奖励功能。我们使用标签和专家反馈在四个数据集上进行的实验表明,我们的填充机制始终提高最先进的信息提取器的性能,尤其是在小型培训数据制度中。
translated by 谷歌翻译